{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3ea857b1",
   "metadata": {},
   "source": [
    "# 使用LangChain构建本地RAG应用\n",
    "\n",
    "该教程假设您已经熟悉以下概念:\n",
    "\n",
    "- [Chat Models](https://python.langchain.com/v0.2/docs/concepts/#chat-models)\n",
    "- [Chaining runnables](https://python.langchain.com/v0.2/docs/how_to/sequence/)\n",
    "- [Embeddings](https://python.langchain.com/v0.2/docs/concepts/#embedding-models)\n",
    "- [Vector stores](https://python.langchain.com/v0.2/docs/concepts/#vector-stores)\n",
    "- [Retrieval-augmented generation](https://python.langchain.com/v0.2/docs/tutorials/rag/)\n",
    "\n",
    "\n",
    "很多流行的项目如 [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://github.com/ollama/ollama), 和 [llamafile](https://github.com/Mozilla-Ocho/llamafile) 显示了本地环境中运行大语言模型的重要性。\n",
    "\n",
    "LangChain 与许多可以本地运行的 [开源 LLM 供应商](https://python.langchain.com/v0.2/docs/how_to/local_llms) 有集成，[Ollama](https://python.langchain.com/v0.2/docs/integrations/providers/ollama/) 便是其中之一。\n",
    "\n",
    "\n",
    "## 环境设置\n",
    "\n",
    "首先，我们需要进行环境设置。\n",
    "\n",
    "Ollama 的 [GitHub仓库](https://github.com/ollama/ollama) 中提供了详细的说明, 简单总结如下:\n",
    "\n",
    "- [下载](https://ollama.com/download) 并运行 Ollama 应用程序\n",
    "- 从命令行, 参考 [Ollama 模型列表](https://ollama.com/library) 和 [文本嵌入模型列表](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) 拉取模型。在该教程中，我们以 `llama3.1:8b` 和 `nomic-embed-text` 为例:\n",
    "  - 命令行输入 `ollama pull llama3.1:8b`，拉取通用的开源大语言模型 `llama3.1:8b` \n",
    "  - 命令行输入 `ollama pull nomic-embed-text` 拉取 [文本嵌入模型](https://ollama.com/search?c=embedding) `nomic-embed-text`\n",
    "- 当应用运行时，所有模型将自动在 `localhost:11434` 上启动\n",
    "- 注意，你的模型选择需要考虑你的本地硬件能力，该教程的参考显存大小 `GPU Memory > 8GB`\n",
    "\n",
    "接下来，安装本地嵌入、向量存储和模型推理所需的包。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a7dc1ec5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: you may need to restart the kernel to use updated packages.\n",
      "Note: you may need to restart the kernel to use updated packages.\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "# langchain_community\n",
    "%pip install -qU langchain langchain_community\n",
    "\n",
    "# Chroma\n",
    "%pip install -qU langchain_chroma\n",
    "\n",
    "# Ollama\n",
    "%pip install -qU langchain_ollama"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02b7914e",
   "metadata": {},
   "source": [
    "You can also [see this page](/docs/integrations/text_embedding/) for a full list of available embeddings models"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e7543fa",
   "metadata": {},
   "source": [
    "## 文档加载\n",
    "\n",
    "现在让我们加载并分割一个示例文档。\n",
    "\n",
    "我们将以 Lilian Weng 的关于 Agent 的 [博客](https://lilianweng.github.io/posts/2023-06-23-agent/) 为例。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8cf5765",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "\n",
    "loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
    "data = loader.load()\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
    "all_splits = text_splitter.split_documents(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "131d5059",
   "metadata": {},
   "source": [
    "接着，初始化向量存储。 我们使用的文本嵌入模型是 [`nomic-embed-text`](https://ollama.com/library/nomic-embed-text) 。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "fdce8923",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_chroma import Chroma\n",
    "from langchain_ollama import OllamaEmbeddings\n",
    "\n",
    "local_embeddings = OllamaEmbeddings(model=\"nomic-embed-text\")\n",
    "\n",
    "vectorstore = Chroma.from_documents(documents=all_splits, embedding=local_embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29137915",
   "metadata": {},
   "source": [
    "现在我们得到了一个本地的向量数据库! 来简单测试一下相似度检索:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b0c55e98",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "question = \"What are the approaches to Task Decomposition?\"\n",
    "docs = vectorstore.similarity_search(question)\n",
    "len(docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "32b43339",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\"}, page_content='Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.')"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcf81052",
   "metadata": {},
   "source": [
    "接下来实例化大语言模型 `llama3.1:8b` 并测试模型推理是否正常："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "af1176bb-d52a-4cf0-b983-8b7433d45b4f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_ollama import ChatOllama\n",
    "\n",
    "model = ChatOllama(\n",
    "    model=\"llama3.1:8b\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "bf0162e0-8c41-4344-88ae-ff2bbaeb12eb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**The scene is set: a packed arena, the crowd on their feet. In the blue corner, we have Stephen Colbert, aka \"The O'Reilly Factor\" himself. In the red corner, the challenger, John Oliver. The judges are announced as Tina Fey, Larry Wilmore, and Patton Oswalt. The crowd roars as the two opponents face off.**\n",
      "\n",
      "**Stephen Colbert (aka \"The Truth with a Twist\"):**\n",
      "Yo, I'm the king of satire, the one they all fear\n",
      "My show's on late, but my jokes are clear\n",
      "I skewer the politicians, with precision and might\n",
      "They tremble at my wit, day and night\n",
      "\n",
      "**John Oliver:**\n",
      "Hold up, Stevie boy, you may have had your time\n",
      "But I'm the new kid on the block, with a different prime\n",
      "Time to wake up from that 90s coma, son\n",
      "My show's got bite, and my facts are never done\n",
      "\n",
      "**Stephen Colbert:**\n",
      "Oh, so you think you're the one, with the \"Last Week\" crown\n",
      "But your jokes are stale, like the ones I wore down\n",
      "I'm the master of absurdity, the lord of the spin\n",
      "You're just a British import, trying to fit in\n",
      "\n",
      "**John Oliver:**\n",
      "Stevie, my friend, you may have been the first\n",
      "But I've got the skill and the wit, that's never blurred\n",
      "My show's not afraid, to take on the fray\n",
      "I'm the one who'll make you think, come what may\n",
      "\n",
      "**Stephen Colbert:**\n",
      "Well, it's time for a showdown, like two old friends\n",
      "Let's see whose satire reigns supreme, till the very end\n",
      "But I've got a secret, that might just seal your fate\n",
      "My humor's contagious, and it's already too late!\n",
      "\n",
      "**John Oliver:**\n",
      "Bring it on, Stevie! I'm ready for you\n",
      "I'll take on your jokes, and show them what to do\n",
      "My sarcasm's sharp, like a scalpel in the night\n",
      "You're just a relic of the past, without a fight\n",
      "\n",
      "**The judges deliberate, weighing the rhymes and the flow. Finally, they announce their decision:**\n",
      "\n",
      "Tina Fey: I've got to go with John Oliver. His jokes were sharper, and his delivery was smoother.\n",
      "\n",
      "Larry Wilmore: Agreed! But Stephen Colbert's still got that old-school charm.\n",
      "\n",
      "Patton Oswalt: You know what? It's a tie. Both of them brought the heat!\n",
      "\n",
      "**The crowd goes wild as both opponents take a bow. The rap battle may be over, but the satire war is just beginning...\n"
     ]
    }
   ],
   "source": [
    "response_message = model.invoke(\n",
    "    \"Simulate a rap battle between Stephen Colbert and John Oliver\"\n",
    ")\n",
    "\n",
    "print(response_message.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d58838ae",
   "metadata": {},
   "source": [
    "## 构建 Chain 表达形式\n",
    "\n",
    "我们可以通过传入检索到的文档和简单的 prompt 来构建一个 `summarization chain` 。\n",
    "\n",
    "它使用提供的输入键值格式化提示模板，并将格式化后的字符串传递给指定的模型："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "18a3716d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The main themes in these documents are:\\n\\n1. **Task Decomposition**: The process of breaking down complex tasks into smaller, manageable subgoals is crucial for efficient task handling.\\n2. **Autonomous Agent System**: A system powered by Large Language Models (LLMs) that can perform planning, reflection, and refinement to improve the quality of final results.\\n3. **Challenges in Planning and Decomposition**:\\n\\t* Long-term planning and task decomposition are challenging for LLMs.\\n\\t* Adjusting plans when faced with unexpected errors is difficult for LLMs.\\n\\t* Humans learn from trial and error, making them more robust than LLMs in certain situations.\\n\\nOverall, the documents highlight the importance of task decomposition and planning in autonomous agent systems powered by LLMs, as well as the challenges that still need to be addressed.'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "prompt = ChatPromptTemplate.from_template(\n",
    "    \"Summarize the main themes in these retrieved docs: {docs}\"\n",
    ")\n",
    "\n",
    "\n",
    "# 将传入的文档转换成字符串的形式\n",
    "def format_docs(docs):\n",
    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
    "\n",
    "\n",
    "chain = {\"docs\": format_docs} | prompt | model | StrOutputParser()\n",
    "\n",
    "question = \"What are the approaches to Task Decomposition?\"\n",
    "\n",
    "docs = vectorstore.similarity_search(question)\n",
    "\n",
    "chain.invoke(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cce6977-52e7-4944-89b4-c161d04f6698",
   "metadata": {},
   "source": [
    "## 简单QA\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "67cefb46-acd3-4c2a-a8f6-b62c7c3e30dc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Task decomposition can be done through (1) simple prompting using LLM, (2) task-specific instructions, or (3) human inputs. This approach helps break down large tasks into smaller, manageable subgoals for efficient handling of complex tasks. It enables agents to plan ahead and improve the quality of final results through reflection and refinement.'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.runnables import RunnablePassthrough\n",
    "\n",
    "RAG_TEMPLATE = \"\"\"\n",
    "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n",
    "\n",
    "<context>\n",
    "{context}\n",
    "</context>\n",
    "\n",
    "Answer the following question:\n",
    "\n",
    "{question}\"\"\"\n",
    "\n",
    "rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)\n",
    "\n",
    "chain = (\n",
    "    RunnablePassthrough.assign(context=lambda input: format_docs(input[\"context\"]))\n",
    "    | rag_prompt\n",
    "    | model\n",
    "    | StrOutputParser()\n",
    ")\n",
    "\n",
    "question = \"What are the approaches to Task Decomposition?\"\n",
    "\n",
    "docs = vectorstore.similarity_search(question)\n",
    "\n",
    "# Run\n",
    "chain.invoke({\"context\": docs, \"question\": question})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "821729cb",
   "metadata": {},
   "source": [
    "## 带有检索的QA\n",
    "\n",
    "最后，我们带有语义检索功能的 QA 应用（本地 RAG 应用），可以根据用户问题自动从向量数据库中检索语义上最相近的文档片段："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "86c7a349",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = vectorstore.as_retriever()\n",
    "\n",
    "qa_chain = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | rag_prompt\n",
    "    | model\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "112ca227",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Task decomposition can be done through (1) simple prompting in Large Language Models (LLM), (2) using task-specific instructions, or (3) with human inputs. This process involves breaking down large tasks into smaller, manageable subgoals for efficient handling of complex tasks.'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "question = \"What are the approaches to Task Decomposition?\"\n",
    "\n",
    "qa_chain.invoke(question)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e75d3e9e",
   "metadata": {},
   "source": [
    "## 总结\n",
    "\n",
    "恭喜，至此，你已经完整的实现了一个基于 Langchain 框架和本地模型构建的 RAG 应用。你可以在教程的基础上替换本地模型来尝试不同模型的效果和能力，或进一步进行扩展，丰富应用的能力和表现力，或者添加更多实用有趣的功能。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89cf3894-ac32-4b61-894a-91d8deddf797",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain",
   "language": "python",
   "name": "langchain"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
